There are 3 datasets involved in this project:
- Rent ordinanc housing inventory
- Fire department calls for service
- Neighborhoods San Francisco 2004 ### Why did you choose this/these particular dataset(s)? We chose these datasets because they provide valuable insights into two crucial aspects of our study: fire incidents and neighborhood economic levels in San Francisco.
The "Fire Department Calls for Service" dataset allows us to analyze fire patterns and response times, which are essential for understanding emergency services' effectiveness. On the other hand, the "Rent Ordinance Housing Inventory" dataset documents rental market prices, providing indicators of neighborhood economic status. We utilize the 'Neighborhoods San Francisco 2004' GeoJSON dataset to visualize neighborhood boundaries. This help with understanding fire rescue resource allocation and economic level.
By combining these datasets, we aim to uncover any potential correlations between fire incidents, response times, and neighborhood economic levels, aiding in better understanding and potentially improving fire emergency services in San Francisco.
What was your goal for the end user's experience?¶
Reveling fire emergency patterns on both time and location aspect, and rescue resources distribution in San Francisco. And to answe the question of which area can you live in to get rescue faster?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#TAKES A LONG FUCKING TIME
firedf = pd.read_csv("Fire_Department_Calls_for_Service_20240414.csv")
C:\Users\ALW\AppData\Local\Temp\ipykernel_18024\1740023720.py:2: DtypeWarning: Columns (19,20,25) have mixed types. Specify dtype option on import or set low_memory=False.
firedf = pd.read_csv("Fire_Department_Calls_for_Service_20240414.csv")
Fire Calls:
Data Reduction: Due to the large size of the fire call dataset causing slow processing, we narrowed our focus to fire incidents in San Francisco from 2019 to 2023. We filtered for call types specifically related to fire incidents to conduct further analysis.
Time Data Establishment: We calculated the time gap from receiving the call to the fire department's arrival.Subsequently, we determined the average response time for each neighborhood.
Eliminate erros: Printing all the minimum and maximum values.Eliminate the unreasonable data, such as time differences less than or equal to 0.
Rent:
Selected representative rent data: Since there were no significant changes in San Francisco's economic development ranking between 2019 and 2023, we chose the latest rent statistics from 2023 to represent the area's economic status. To ensure accuracy, we only utilized occupancy situations classified as "occupied by non-owner," removing outliers where rent equals 0.
Clculate average Rent per Square Foot: Because the statistical presentation of both area and rent in intervals, we computed the average of the upper and lower bounds of each interval to represent the property's area and rent. We then divided the average rent by the average area to determine the average rent per square foot for each neighborhood.
Geo Jason:
- Consistent Area Division: As there were difference in geographical divisions among the three datasets regarding place names and regions, we manually resolved these inconsistencies by referencing internet sources.
wanted_call_types = ["Alarms", "Electrical Hazard", "Explosion", "Lightning Strike (Investigation)", "Marine Fire", "Outside Fire",
"Smoke Investigation (Outside)", "Structure Fire", "Train / Rail Fire", "Vehicle Fire"]
firedf = firedf[firedf["Call Type"].isin(wanted_call_types)]
firedf['Call Date'] = pd.to_datetime(firedf['Call Date'])
firedf.index = firedf["Call Date"]
firedf = firedf.sort_index()
firedf = firedf.loc["2019-01-01":"2023-12-31"]
firedf = firedf[firedf["Received DtTm"].notnull()]
firedf = firedf[firedf["On Scene DtTm"].notnull()]
firedf = firedf[firedf["Neighborhooods - Analysis Boundaries"].notnull()]
firedf = firedf[~firedf["Neighborhooods - Analysis Boundaries"].isin(["Treasure Island"])]
firedf["Neighborhooods - Analysis Boundaries"] = firedf["Neighborhooods - Analysis Boundaries"].replace({"Hayes Valley": "Western Addition",
"Lone Mountain/USF": "Inner Richmond",
"McLaren Park": "Visitacion Valley","Japantown": "Western Addition",
"Lincoln Park": "Seacliff","Oceanview/Merced/Ingleside": "Ocean View",
"Financial District/South Beach": "Financial District","Portola": "Excelsior",
"Tenderloin": "Downtown/Civic Center","Mission Bay": "South of Market",
"Bayview Hunters Point": "Bayview"})
firedf = firedf[firedf["Received DtTm"] != firedf["On Scene DtTm"]]
received_series = pd.to_datetime(firedf["Received DtTm"])
on_scene_series = pd.to_datetime(firedf["On Scene DtTm"])
response_times = (on_scene_series-received_series).astype('timedelta64[ns]')
firedf["Response times"] = response_times
firedf = firedf[response_times.dt.total_seconds() > 0]
firedf.index = pd.to_datetime(firedf["Received DtTm"])
firedf = firedf.sort_index()
firedf
| Call Number | Unit ID | Incident Number | Call Type | Call Date | Watch Date | Received DtTm | Entry DtTm | Dispatch DtTm | Response DtTm | ... | Unit sequence in call dispatch | Fire Prevention District | Supervisor District | Neighborhooods - Analysis Boundaries | RowID | case_location | data_as_of | data_loaded_at | Analysis Neighborhoods | Response times | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Received DtTm | |||||||||||||||||||||
| 2019-01-01 00:07:30 | 190010050 | B08 | 19000005 | Outside Fire | 2019-01-01 | 12/31/2018 | 01/01/2019 12:07:30 AM | 01/01/2019 12:08:16 AM | 01/01/2019 12:08:34 AM | 01/01/2019 12:10:44 AM | ... | 2.0 | 9.0 | 7.0 | West of Twin Peaks | 190010050-B08 | POINT (-122.45801 37.736103) | 02/05/2024 03:27:52 AM | 02/05/2024 10:56:25 AM | 41.0 | 0 days 00:06:56 |
| 2019-01-01 00:07:30 | 190010050 | E40 | 19000005 | Outside Fire | 2019-01-01 | 12/31/2018 | 01/01/2019 12:07:30 AM | 01/01/2019 12:08:16 AM | 01/01/2019 12:08:34 AM | 01/01/2019 12:12:11 AM | ... | 6.0 | 9.0 | 7.0 | West of Twin Peaks | 190010050-E40 | POINT (-122.45801 37.736103) | 02/05/2024 03:27:52 AM | 02/05/2024 10:56:25 AM | 41.0 | 0 days 00:11:29 |
| 2019-01-01 00:07:30 | 190010050 | T19 | 19000005 | Outside Fire | 2019-01-01 | 12/31/2018 | 01/01/2019 12:07:30 AM | 01/01/2019 12:08:16 AM | 01/01/2019 12:08:34 AM | 01/01/2019 12:09:50 AM | ... | 5.0 | 9.0 | 7.0 | West of Twin Peaks | 190010050-T19 | POINT (-122.45801 37.736103) | 02/05/2024 03:27:52 AM | 02/05/2024 10:56:25 AM | 41.0 | 0 days 00:09:44 |
| 2019-01-01 00:07:30 | 190010050 | E15 | 19000005 | Outside Fire | 2019-01-01 | 12/31/2018 | 01/01/2019 12:07:30 AM | 01/01/2019 12:08:16 AM | 01/01/2019 12:08:34 AM | 01/01/2019 12:10:15 AM | ... | 7.0 | 9.0 | 7.0 | West of Twin Peaks | 190010050-E15 | POINT (-122.45801 37.736103) | 02/05/2024 03:27:52 AM | 02/05/2024 10:56:25 AM | 41.0 | 0 days 00:27:03 |
| 2019-01-01 00:07:30 | 190010050 | E39 | 19000005 | Outside Fire | 2019-01-01 | 12/31/2018 | 01/01/2019 12:07:30 AM | 01/01/2019 12:08:16 AM | 01/01/2019 12:08:34 AM | 01/01/2019 12:10:11 AM | ... | 1.0 | 9.0 | 7.0 | West of Twin Peaks | 190010050-E39 | POINT (-122.45801 37.736103) | 02/05/2024 03:27:52 AM | 02/05/2024 10:56:25 AM | 41.0 | 0 days 00:06:39 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2023-12-31 23:08:10 | 233653065 | E03 | 23177194 | Alarms | 2023-12-31 | 12/31/2023 | 12/31/2023 11:08:10 PM | 12/31/2023 11:09:21 PM | 12/31/2023 11:09:28 PM | 12/31/2023 11:10:16 PM | ... | 1.0 | 2.0 | 6.0 | Downtown/Civic Center | 233653065-E03 | POINT (-122.41424 37.783722) | 02/05/2024 03:27:52 AM | 02/05/2024 10:56:25 AM | 36.0 | 0 days 00:04:46 |
| 2023-12-31 23:30:19 | 233653128 | B01 | 23177205 | Alarms | 2023-12-31 | 12/31/2023 | 12/31/2023 11:30:19 PM | 12/31/2023 11:32:04 PM | 12/31/2023 11:33:59 PM | 12/31/2023 11:36:07 PM | ... | 3.0 | 1.0 | 3.0 | North Beach | 233653128-B01 | POINT (-122.40815 37.803432) | 02/05/2024 03:27:52 AM | 02/05/2024 10:56:25 AM | 23.0 | 0 days 00:09:13 |
| 2023-12-31 23:30:19 | 233653128 | E28 | 23177205 | Alarms | 2023-12-31 | 12/31/2023 | 12/31/2023 11:30:19 PM | 12/31/2023 11:32:04 PM | 12/31/2023 11:33:59 PM | 12/31/2023 11:35:24 PM | ... | 1.0 | 1.0 | 3.0 | North Beach | 233653128-E28 | POINT (-122.40815 37.803432) | 02/05/2024 03:27:52 AM | 02/05/2024 10:56:25 AM | 23.0 | 0 days 00:06:03 |
| 2023-12-31 23:30:19 | 233653128 | T02 | 23177205 | Alarms | 2023-12-31 | 12/31/2023 | 12/31/2023 11:30:19 PM | 12/31/2023 11:32:04 PM | 12/31/2023 11:33:59 PM | 12/31/2023 11:36:17 PM | ... | 2.0 | 1.0 | 3.0 | North Beach | 233653128-T02 | POINT (-122.40815 37.803432) | 02/05/2024 03:27:52 AM | 02/05/2024 10:56:25 AM | 23.0 | 0 days 00:08:56 |
| 2023-12-31 23:32:43 | 233653133 | E21 | 23177207 | Outside Fire | 2023-12-31 | 12/31/2023 | 12/31/2023 11:32:43 PM | 12/31/2023 11:33:21 PM | 12/31/2023 11:34:33 PM | 12/31/2023 11:35:17 PM | ... | 1.0 | 5.0 | 1.0 | Inner Richmond | 233653133-E21 | POINT (-122.453 37.77497) | 02/05/2024 03:27:52 AM | 02/05/2024 10:56:25 AM | 18.0 | 0 days 00:05:48 |
191919 rows × 38 columns
def convert_point_to_tuple(point_entry):
return eval(",".join(point_entry.split("POINT ")[1].split(" ")))
rentdf = pd.read_csv("Rent_Ordinance_Housing_Inventory_20240505.csv")
rentdf = rentdf[rentdf["submission_year"]==2023]
rentdf = rentdf[rentdf["occupancy_type"]=="Occupied by non-owner"]
rentdf = rentdf[rentdf["analysis_neighborhood"].notnull()]
rentdf["analysis_neighborhood"] = rentdf["analysis_neighborhood"].replace({"Hayes Valley": "Western Addition",
"Lone Mountain/USF": "Inner Richmond",
"McLaren Park": "Visitacion Valley","Japantown": "Western Addition",
"Lincoln Park": "Seacliff","Oceanview/Merced/Ingleside": "Ocean View",
"Financial District/South Beach": "Financial District","Portola": "Excelsior",
"Tenderloin": "Downtown/Civic Center","Mission Bay": "South of Market",
"Bayview Hunters Point": "Bayview"})
rentdf = rentdf[rentdf["point"].notnull()]
rentdf = rentdf[rentdf["monthly_rent"]!="$0 (no rent paid by the occupant)"]
rentdf = rentdf[rentdf["square_footage"]!="Unknown"]
rentdf["point"] = rentdf["point"].map(convert_point_to_tuple)
def get_mean_rent(elem):
numbers = elem.split("-$")
if len(numbers)==1:
numbers[0]=int(numbers[0].replace("$","").replace("+",""))
return int(numbers[0])
elif len(numbers)==2:
numbers[0]=int(numbers[0].replace("$","").replace("+",""))
numbers[1]=int(numbers[1].replace("$","").replace("+",""))
return sum(numbers)/2
else:
raise Exception(f"wtf? {elem} {numbers}")
rentdf["true_monthly_mean"] = rentdf["monthly_rent"].map(get_mean_rent)
def get_mean_footage(elem):
numbers=elem.split(" Sq.Ft")[0].split("-")
if len(numbers)==1:
numbers[0] = numbers[0].split("+")[0]
return int(numbers[0])
elif len(numbers)==2:
numbers[0]=int(numbers[0])
numbers[1]=int(numbers[1])
return sum(numbers)/2
rentdf["true_square_footage"] = rentdf["square_footage"].map(get_mean_footage)
rentdf
| unique_id | block_num | unit_count | case_type_name | submission_year | block_address | occupancy_type | occupancy_or_vacancy_date | occupancy_or_vacancy_date_year | bedroom_count | ... | signature_date | occupancy_or_vacancy_date_history | year_property_built | point | analysis_neighborhood | supervisor_district | data_as_of | data_loaded_at | true_monthly_mean | true_square_footage | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 5784430458147580041 | 0750 | 142 | Housing Inventory - Unit information (2023) | 2023 | 1400 Block of TURK ST | Occupied by non-owner | 2021/03/17 | 2021 | Two-Bedroom | ... | 2023/02/28 | NaN | 1993.0 | (-122.432973442, 37.780445712) | Western Addition | 5.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 3125.5 | 875.5 |
| 1 | 4954451792168111171 | 0750 | 142 | Housing Inventory - Unit information (2023) | 2023 | 1400 Block of TURK ST | Occupied by non-owner | 2016/03/12 | 2016 | Studio | ... | 2023/02/28 | NaN | 1993.0 | (-122.432973442, 37.780445712) | Western Addition | 5.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 1375.5 | 375.5 |
| 2 | -5063247231305562579 | 0750 | 142 | Housing Inventory - Unit information (2023) | 2023 | 1400 Block of TURK ST | Occupied by non-owner | 2021/12/15 | 2021 | Studio | ... | 2023/02/28 | NaN | 1993.0 | (-122.432973442, 37.780445712) | Western Addition | 5.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 2375.5 | 625.5 |
| 3 | -9079354581068241121 | 0750 | 142 | Housing Inventory - Unit information (2023) | 2023 | 1400 Block of TURK ST | Occupied by non-owner | 2010/02/01 | 2010 | One-Bedroom | ... | 2023/02/28 | NaN | 1993.0 | (-122.432973442, 37.780445712) | Western Addition | 5.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 1875.5 | 875.5 |
| 4 | 756572995386429446 | 0750 | 142 | Housing Inventory - Unit information (2023) | 2023 | 1400 Block of TURK ST | Occupied by non-owner | 2004/06/26 | 2004 | One-Bedroom | ... | 2023/02/28 | NaN | 1993.0 | (-122.432973442, 37.780445712) | Western Addition | 5.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 2875.5 | 625.5 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 152093 | 4047857095214640194 | 0065 | 74 | Housing Inventory - Unit information (2023) | 2023 | 900 Block of COLUMBUS AVE | Occupied by non-owner | 2022/10/27 | 2022 | Studio | ... | 2023/02/17 | [\n {\n "date_range_type": "Occupied",\n ... | 1916.0 | (-122.414343193, 37.80310132) | Russian Hill | 3.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 1375.5 | 125.0 |
| 152094 | 8790490401777245212 | 0065 | 74 | Housing Inventory - Unit information (2023) | 2023 | 900 Block of COLUMBUS AVE | Occupied by non-owner | 2017/12/14 | 2017 | Studio | ... | 2023/02/17 | NaN | 1916.0 | (-122.414343193, 37.80310132) | Russian Hill | 3.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 1125.5 | 125.0 |
| 152095 | 4275828707947196532 | 0065 | 74 | Housing Inventory - Unit information (2023) | 2023 | 900 Block of COLUMBUS AVE | Occupied by non-owner | 2015/04/23 | 2015 | Studio | ... | 2023/02/17 | NaN | 1916.0 | (-122.414343193, 37.80310132) | Russian Hill | 3.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 1125.5 | 125.0 |
| 152096 | 1350372243345719808 | 0065 | 74 | Housing Inventory - Unit information (2023) | 2023 | 900 Block of COLUMBUS AVE | Occupied by non-owner | 2020/08/07 | 2020 | Studio | ... | 2023/02/17 | NaN | 1916.0 | (-122.414343193, 37.80310132) | Russian Hill | 3.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 1125.5 | 125.0 |
| 152097 | 6894355032800143611 | 0065 | 74 | Housing Inventory - Unit information (2023) | 2023 | 900 Block of COLUMBUS AVE | Occupied by non-owner | 2022/11/15 | 2022 | Studio | ... | 2023/02/17 | [\n {\n "date_range_type": "Occupied",\n ... | 1916.0 | (-122.414343193, 37.80310132) | Russian Hill | 3.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 1375.5 | 125.0 |
55743 rows × 30 columns
districts = ["Golden Gate Park","Sunset/Parkside","Seacliff","Presidio","Outer Richmond","Inner Sunset","Haight Ashbury","Presidio Heights",
"Inner Richmond","North Beach","Russian Hill","Nob Hill","Chinatown","Downtown/Civic Center","Financial District","Marina",
"Western Addition","Pacific Heights","South of Market","Mission","Bayview","Potrero Hill","Bernal Heights","Excelsior",
"Visitacion Valley","Ocean View","Lakeshore","Glen Park","Twin Peaks","Castro/Upper Market","Noe Valley","Outer Mission",
"West of Twin Peaks"]
districtdict = {}
for district in districts:
districtdf = firedf[firedf["Neighborhooods - Analysis Boundaries"]==district]
rentdistrictdf = rentdf[rentdf["analysis_neighborhood"]==district]
d_r_t = districtdf["Response times"]
meanrent = np.mean(rentdistrictdf["true_monthly_mean"])
meanfootage = np.mean(rentdistrictdf["true_square_footage"])
mean_per_mean = meanrent/meanfootage if district!= "Presidio" else None
districtdict.update({district: [np.mean(d_r_t.dt.total_seconds()/60),min(d_r_t.dt.total_seconds()/60),max(d_r_t.dt.total_seconds()/60),mean_per_mean]})
district_df = pd.DataFrame.from_dict(districtdict,orient="index")
district_df = district_df.reset_index()
district_df = district_df.rename(columns={"index": "NEIGHBORHO", 0: "Mean response time (min)", 1: "Fastest response time", 2:"Slowest response time",
3:"Mean rent ($/ft^2)"})
district_df
| NEIGHBORHO | Mean response time (min) | Fastest response time | Slowest response time | Mean rent ($/ft^2) | |
|---|---|---|---|---|---|
| 0 | Golden Gate Park | 8.285412 | 1.250000 | 281.716667 | 4.396416 |
| 1 | Sunset/Parkside | 7.792845 | 0.350000 | 263.350000 | 2.659374 |
| 2 | Seacliff | 8.268844 | 0.800000 | 82.200000 | 2.669059 |
| 3 | Presidio | 8.767269 | 1.966667 | 135.550000 | NaN |
| 4 | Outer Richmond | 7.193393 | 0.100000 | 107.650000 | 2.745444 |
| 5 | Inner Sunset | 7.218015 | 1.533333 | 90.300000 | 3.124437 |
| 6 | Haight Ashbury | 7.040618 | 0.100000 | 95.016667 | 3.419354 |
| 7 | Presidio Heights | 6.909119 | 1.116667 | 240.266667 | 3.208674 |
| 8 | Inner Richmond | 6.985809 | 1.633333 | 152.250000 | 3.050679 |
| 9 | North Beach | 6.732717 | 1.550000 | 55.416667 | 3.793911 |
| 10 | Russian Hill | 7.052900 | 0.950000 | 205.633333 | 3.951208 |
| 11 | Nob Hill | 6.352649 | 0.900000 | 138.466667 | 3.588992 |
| 12 | Chinatown | 6.167121 | 0.583333 | 72.483333 | 3.294388 |
| 13 | Downtown/Civic Center | 6.490172 | 0.516667 | 113.683333 | 3.687266 |
| 14 | Financial District | 7.342674 | 0.200000 | 191.316667 | 4.235989 |
| 15 | Marina | 7.030896 | 0.783333 | 52.766667 | 3.773530 |
| 16 | Western Addition | 6.349658 | 0.066667 | 222.666667 | 3.583172 |
| 17 | Pacific Heights | 6.276151 | 0.516667 | 73.066667 | 3.510876 |
| 18 | South of Market | 6.771137 | 0.333333 | 107.500000 | 3.879805 |
| 19 | Mission | 6.761253 | 0.233333 | 224.433333 | 3.116139 |
| 20 | Bayview | 8.551874 | 0.083333 | 199.183333 | 2.458615 |
| 21 | Potrero Hill | 7.868106 | 1.450000 | 256.983333 | 3.640064 |
| 22 | Bernal Heights | 7.967816 | 0.900000 | 215.566667 | 3.249949 |
| 23 | Excelsior | 8.031674 | 1.050000 | 103.316667 | 2.439387 |
| 24 | Visitacion Valley | 10.061952 | 1.383333 | 68.666667 | 2.369647 |
| 25 | Ocean View | 7.682359 | 0.283333 | 249.150000 | 2.523348 |
| 26 | Lakeshore | 7.815318 | 1.800000 | 112.966667 | 2.577835 |
| 27 | Glen Park | 8.137615 | 1.200000 | 103.716667 | 3.069781 |
| 28 | Twin Peaks | 8.480395 | 1.366667 | 61.500000 | 3.125427 |
| 29 | Castro/Upper Market | 6.741658 | 0.766667 | 207.516667 | 3.412938 |
| 30 | Noe Valley | 6.838826 | 0.683333 | 246.000000 | 3.442789 |
| 31 | Outer Mission | 7.867571 | 0.900000 | 180.500000 | 2.986147 |
| 32 | West of Twin Peaks | 7.633971 | 0.066667 | 111.933333 | 2.808919 |
tempdf = district_df.sort_values(by=["Mean response time (min)"])
tempdf["Speed rank"]=list(range(0,len(district_df)))
district_df = tempdf.sort_index()
tempdf = district_df.sort_values(by=["Mean rent ($/ft^2)"])
tempdf["Rent rank"]=list(range(len(district_df)-1,-1,-1))
district_df = tempdf.sort_index()
district_df
| NEIGHBORHO | Mean response time (min) | Fastest response time | Slowest response time | Mean rent ($/ft^2) | Speed rank | Rent rank | |
|---|---|---|---|---|---|---|---|
| 0 | Golden Gate Park | 8.285412 | 1.250000 | 281.716667 | 4.396416 | 28 | 1 |
| 1 | Sunset/Parkside | 7.792845 | 0.350000 | 263.350000 | 2.659374 | 20 | 27 |
| 2 | Seacliff | 8.268844 | 0.800000 | 82.200000 | 2.669059 | 27 | 26 |
| 3 | Presidio | 8.767269 | 1.966667 | 135.550000 | NaN | 31 | 0 |
| 4 | Outer Richmond | 7.193393 | 0.100000 | 107.650000 | 2.745444 | 15 | 25 |
| 5 | Inner Sunset | 7.218015 | 1.533333 | 90.300000 | 3.124437 | 16 | 19 |
| 6 | Haight Ashbury | 7.040618 | 0.100000 | 95.016667 | 3.419354 | 13 | 13 |
| 7 | Presidio Heights | 6.909119 | 1.116667 | 240.266667 | 3.208674 | 10 | 17 |
| 8 | Inner Richmond | 6.985809 | 1.633333 | 152.250000 | 3.050679 | 11 | 22 |
| 9 | North Beach | 6.732717 | 1.550000 | 55.416667 | 3.793911 | 5 | 5 |
| 10 | Russian Hill | 7.052900 | 0.950000 | 205.633333 | 3.951208 | 14 | 3 |
| 11 | Nob Hill | 6.352649 | 0.900000 | 138.466667 | 3.588992 | 3 | 9 |
| 12 | Chinatown | 6.167121 | 0.583333 | 72.483333 | 3.294388 | 0 | 15 |
| 13 | Downtown/Civic Center | 6.490172 | 0.516667 | 113.683333 | 3.687266 | 4 | 7 |
| 14 | Financial District | 7.342674 | 0.200000 | 191.316667 | 4.235989 | 17 | 2 |
| 15 | Marina | 7.030896 | 0.783333 | 52.766667 | 3.773530 | 12 | 6 |
| 16 | Western Addition | 6.349658 | 0.066667 | 222.666667 | 3.583172 | 2 | 10 |
| 17 | Pacific Heights | 6.276151 | 0.516667 | 73.066667 | 3.510876 | 1 | 11 |
| 18 | South of Market | 6.771137 | 0.333333 | 107.500000 | 3.879805 | 8 | 4 |
| 19 | Mission | 6.761253 | 0.233333 | 224.433333 | 3.116139 | 7 | 20 |
| 20 | Bayview | 8.551874 | 0.083333 | 199.183333 | 2.458615 | 30 | 30 |
| 21 | Potrero Hill | 7.868106 | 1.450000 | 256.983333 | 3.640064 | 23 | 8 |
| 22 | Bernal Heights | 7.967816 | 0.900000 | 215.566667 | 3.249949 | 24 | 16 |
| 23 | Excelsior | 8.031674 | 1.050000 | 103.316667 | 2.439387 | 25 | 31 |
| 24 | Visitacion Valley | 10.061952 | 1.383333 | 68.666667 | 2.369647 | 32 | 32 |
| 25 | Ocean View | 7.682359 | 0.283333 | 249.150000 | 2.523348 | 19 | 29 |
| 26 | Lakeshore | 7.815318 | 1.800000 | 112.966667 | 2.577835 | 21 | 28 |
| 27 | Glen Park | 8.137615 | 1.200000 | 103.716667 | 3.069781 | 26 | 21 |
| 28 | Twin Peaks | 8.480395 | 1.366667 | 61.500000 | 3.125427 | 29 | 18 |
| 29 | Castro/Upper Market | 6.741658 | 0.766667 | 207.516667 | 3.412938 | 6 | 14 |
| 30 | Noe Valley | 6.838826 | 0.683333 | 246.000000 | 3.442789 | 9 | 12 |
| 31 | Outer Mission | 7.867571 | 0.900000 | 180.500000 | 2.986147 | 22 | 23 |
| 32 | West of Twin Peaks | 7.633971 | 0.066667 | 111.933333 | 2.808919 | 18 | 24 |
def convert_point_to_tuple(point_entry):
return eval(",".join(point_entry.split("POINT ")[1].split(" ")))
rentdf = pd.read_csv("Rent_Ordinance_Housing_Inventory_20240505.csv")
rentdf = rentdf[rentdf["submission_year"]==2023]
rentdf = rentdf[rentdf["occupancy_type"]=="Occupied by non-owner"]
rentdf = rentdf[rentdf["analysis_neighborhood"].notnull()]
rentdf["analysis_neighborhood"] = rentdf["analysis_neighborhood"].replace({"Hayes Valley": "Western Addition",
"Lone Mountain/USF": "Inner Richmond",
"McLaren Park": "Visitacion Valley","Japantown": "Western Addition",
"Lincoln Park": "Seacliff","Oceanview/Merced/Ingleside": "Ocean View",
"Financial District/South Beach": "Financial District","Portola": "Excelsior",
"Tenderloin": "Downtown/Civic Center","Mission Bay": "South of Market",
"Bayview Hunters Point": "Bayview"})
rentdf = rentdf[rentdf["point"].notnull()]
rentdf = rentdf[rentdf["monthly_rent"]!="$0 (no rent paid by the occupant)"]
rentdf = rentdf[rentdf["square_footage"]!="Unknown"]
rentdf["point"] = rentdf["point"].map(convert_point_to_tuple)
def get_mean_rent(elem):
numbers = elem.split("-$")
if len(numbers)==1:
numbers[0]=int(numbers[0].replace("$","").replace("+",""))
return int(numbers[0])
elif len(numbers)==2:
numbers[0]=int(numbers[0].replace("$","").replace("+",""))
numbers[1]=int(numbers[1].replace("$","").replace("+",""))
return sum(numbers)/2
else:
raise Exception(f"wtf? {elem} {numbers}")
rentdf["true_monthly_mean"] = rentdf["monthly_rent"].map(get_mean_rent)
def get_mean_footage(elem):
numbers=elem.split(" Sq.Ft")[0].split("-")
if len(numbers)==1:
numbers[0] = numbers[0].split("+")[0]
return int(numbers[0])
elif len(numbers)==2:
numbers[0]=int(numbers[0])
numbers[1]=int(numbers[1])
return sum(numbers)/2
rentdf["true_square_footage"] = rentdf["square_footage"].map(get_mean_footage)
rentdf
| unique_id | block_num | unit_count | case_type_name | submission_year | block_address | occupancy_type | occupancy_or_vacancy_date | occupancy_or_vacancy_date_year | bedroom_count | ... | signature_date | occupancy_or_vacancy_date_history | year_property_built | point | analysis_neighborhood | supervisor_district | data_as_of | data_loaded_at | true_monthly_mean | true_square_footage | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 5784430458147580041 | 0750 | 142 | Housing Inventory - Unit information (2023) | 2023 | 1400 Block of TURK ST | Occupied by non-owner | 2021/03/17 | 2021 | Two-Bedroom | ... | 2023/02/28 | NaN | 1993.0 | (-122.432973442, 37.780445712) | Western Addition | 5.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 3125.5 | 875.5 |
| 1 | 4954451792168111171 | 0750 | 142 | Housing Inventory - Unit information (2023) | 2023 | 1400 Block of TURK ST | Occupied by non-owner | 2016/03/12 | 2016 | Studio | ... | 2023/02/28 | NaN | 1993.0 | (-122.432973442, 37.780445712) | Western Addition | 5.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 1375.5 | 375.5 |
| 2 | -5063247231305562579 | 0750 | 142 | Housing Inventory - Unit information (2023) | 2023 | 1400 Block of TURK ST | Occupied by non-owner | 2021/12/15 | 2021 | Studio | ... | 2023/02/28 | NaN | 1993.0 | (-122.432973442, 37.780445712) | Western Addition | 5.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 2375.5 | 625.5 |
| 3 | -9079354581068241121 | 0750 | 142 | Housing Inventory - Unit information (2023) | 2023 | 1400 Block of TURK ST | Occupied by non-owner | 2010/02/01 | 2010 | One-Bedroom | ... | 2023/02/28 | NaN | 1993.0 | (-122.432973442, 37.780445712) | Western Addition | 5.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 1875.5 | 875.5 |
| 4 | 756572995386429446 | 0750 | 142 | Housing Inventory - Unit information (2023) | 2023 | 1400 Block of TURK ST | Occupied by non-owner | 2004/06/26 | 2004 | One-Bedroom | ... | 2023/02/28 | NaN | 1993.0 | (-122.432973442, 37.780445712) | Western Addition | 5.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 2875.5 | 625.5 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 152093 | 4047857095214640194 | 0065 | 74 | Housing Inventory - Unit information (2023) | 2023 | 900 Block of COLUMBUS AVE | Occupied by non-owner | 2022/10/27 | 2022 | Studio | ... | 2023/02/17 | [\n {\n "date_range_type": "Occupied",\n ... | 1916.0 | (-122.414343193, 37.80310132) | Russian Hill | 3.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 1375.5 | 125.0 |
| 152094 | 8790490401777245212 | 0065 | 74 | Housing Inventory - Unit information (2023) | 2023 | 900 Block of COLUMBUS AVE | Occupied by non-owner | 2017/12/14 | 2017 | Studio | ... | 2023/02/17 | NaN | 1916.0 | (-122.414343193, 37.80310132) | Russian Hill | 3.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 1125.5 | 125.0 |
| 152095 | 4275828707947196532 | 0065 | 74 | Housing Inventory - Unit information (2023) | 2023 | 900 Block of COLUMBUS AVE | Occupied by non-owner | 2015/04/23 | 2015 | Studio | ... | 2023/02/17 | NaN | 1916.0 | (-122.414343193, 37.80310132) | Russian Hill | 3.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 1125.5 | 125.0 |
| 152096 | 1350372243345719808 | 0065 | 74 | Housing Inventory - Unit information (2023) | 2023 | 900 Block of COLUMBUS AVE | Occupied by non-owner | 2020/08/07 | 2020 | Studio | ... | 2023/02/17 | NaN | 1916.0 | (-122.414343193, 37.80310132) | Russian Hill | 3.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 1125.5 | 125.0 |
| 152097 | 6894355032800143611 | 0065 | 74 | Housing Inventory - Unit information (2023) | 2023 | 900 Block of COLUMBUS AVE | Occupied by non-owner | 2022/11/15 | 2022 | Studio | ... | 2023/02/17 | [\n {\n "date_range_type": "Occupied",\n ... | 1916.0 | (-122.414343193, 37.80310132) | Russian Hill | 3.0 | 2024/05/04 12:00:00 AM | 2024/05/05 06:07:07 AM | 1375.5 | 125.0 |
55743 rows × 30 columns
During our initial investigation, we looked into how fire incidents were spread out over different time periods. We noticed that the number of incidents varied widely across different types, and there wasn't any clear pattern over time.
Also, the discrepancy in how areas were named across the three databases made it harder for us to study the differences between regions. This made our goal of understanding regional variations more challenging.
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
monthsnums = list(range(0, 12, 1))
num_rows = len(wanted_call_types) // 2 + len(wanted_call_types) % 2
num_cols = 2
fig, axes = plt.subplots(num_rows, num_cols, figsize=(20, 30))
plt.subplots_adjust(left=0.125, right=0.9, bottom=0.1, top=0.9, wspace=0.2, hspace=0.5) # 调整布局参数
for i, calltype in enumerate(wanted_call_types):
calltypedf = firedf[firedf["Call Type"] == calltype]
monthdict = {}
for j, month in enumerate(months):
monthdict.update({month: len(calltypedf[calltypedf.index.month == j + 1])})
plot_df = pd.DataFrame([monthdict])
ax = axes[i // num_cols, i % num_cols]
plot_df.iloc[0].plot.bar(ax=ax)
ax.set_ylabel("Number of incidents")
ax.set_xlabel("Month")
ax.set_xticks(monthsnums)
ax.set_xticklabels(months)
ax.set_title(f"Number of Calls of Type: '{calltype}'")
plt.show()
hournums = list(range(0, 24, 1))
num_rows = len(wanted_call_types) // 2 + len(wanted_call_types) % 2
num_cols = 2
fig, axes = plt.subplots(num_rows, num_cols, figsize=(12, 30))
plt.subplots_adjust(left=0.125, right=0.9, bottom=0.1, top=0.9, wspace=0.2, hspace=0.5)
for i, calltype in enumerate(wanted_call_types):
calltypedf = firedf[firedf["Call Type"] == calltype]
hourdict = {}
for hour in hournums:
hourdict.update({hour: len(calltypedf[calltypedf.index.hour==hour])})
plot_df = pd.DataFrame([hourdict])
ax = axes[i // num_cols, i % num_cols]
plot_df.iloc[0].plot.bar(ax=ax)
ax.set_ylabel("Number of incidents")
ax.set_xlabel("Hour of day")
ax.set_title(f"Number of Calls of Type: '{calltype}'")
plt.show()
yearnums = list(range(2019, 2023+1, 1))
num_rows = 1
num_cols = 1
datadict = {year: [0 for _ in wanted_call_types] for year in yearnums}
for year in yearnums:
for i, calltype in enumerate(wanted_call_types):
calltypedf = firedf[firedf["Call Type"] == calltype]
datadict[year][i] = len(calltypedf[calltypedf.index.year==year])
plot_df = pd.DataFrame.from_dict(datadict,orient='index',columns=wanted_call_types)
# ax = axes
# plot_df.iloc[0].plot.bar(ax=ax,stacked=True)
# ax.set_ylabel("Number of incidents")
# ax.set_xlabel("Year")
# ax.set_title(f"Number of Calls")
# plt.show()
ax = plot_df.plot.bar(stacked=True,colormap="tab20b")
ax.set_ylabel("Number of incidents")
ax.set_xlabel("Year")
ax.set_title(f"Numbers of Fire Incidents from 2019 to 2023")
leg = plt.legend( loc = 'upper right')
plt.draw() # Draw the figure so you can find the positon of the legend.
# Get the bounding box of the original legend
bb = leg.get_bbox_to_anchor().transformed(ax.transAxes.inverted())
# Change to location of the legend.
xOffset = 0.6
bb.x0 += xOffset
bb.x1 += xOffset
leg.set_bbox_to_anchor(bb, transform = ax.transAxes)
plt.show()
import plotly.express as px
import plotly.io as pio
import geopandas as gpd
import shapely
gdf = gpd.read_file("ark28722-s75c8t-geojson.json")
gdf.to_crs(epsg=4326, inplace=True)
merge_gdf1 = gdf[gdf["NEIGHBORHO"].isin(["Outer Sunset", "Parkside"])]
merge_gdf2 = gdf[gdf["NEIGHBORHO"].isin(["Glen Park", "Diamond Heights"])]
merge_gdf3 = gdf[gdf["NEIGHBORHO"].isin(["Visitacion Valley", "Crocker Amazon"])]
gdf = gdf[~gdf["NEIGHBORHO"].isin(["Crocker Amazon","Parkside","Diamond Heights", "Parkside"])]
gdf[gdf["NEIGHBORHO"]=="Outer Sunset"]=["s75c8t.2",2,"Sunset/Parkside",shapely.unary_union(merge_gdf1["geometry"])]
gdf[gdf["NEIGHBORHO"]=="Glen Park"]=["s75c8t.30",30,"Glen Park",shapely.unary_union(merge_gdf2["geometry"])]
gdf[gdf["NEIGHBORHO"]=="Visitacion Valley"]=["s75c8t.25",25,"Visitacion Valley",shapely.unary_union(merge_gdf3["geometry"])]
gdf.set_index('NEIGHBORHO', inplace=True)
fig = px.choropleth_mapbox(district_df, geojson=gdf["geometry"], locations=gdf.index, color='Mean response time (min)',
color_continuous_scale="temps",hover_data = ["Speed rank"],
#range_color=(0, 12),
mapbox_style="carto-positron",
zoom=11, center = {"lat": 37.773972, "lon": -122.431297},
opacity=0.5,
)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
pio.write_html(fig, file='map_t.html', auto_open=True)
fig = px.choropleth_mapbox(district_df, geojson=gdf["geometry"], locations=gdf.index, color='Mean rent ($/ft^2)',
color_continuous_scale="temps",hover_data = ["Rent rank"],
#range_color=(0, 12),
mapbox_style="carto-positron",
zoom=11, center = {"lat": 37.773972, "lon": -122.431297},
opacity=0.5,
)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()
pio.write_html(fig, file='map_r.html', auto_open=True)
After our initial data check, we decided to switch gears. Rather than focusing on when fire incidents occur, we shifted our attention to understanding how economic factors influence emergency response times. We tidied up and organized our data, then visualized it using two maps: one showing response times by neighborhood and another displaying rent prices per square foot. We compared these across the entire city and different areas.
Additionally, we ranked response times and rent prices, providing hover labels with average values and rankings for clarity.
In conclusion, we discovered that, with a few exceptions, areas tend to have quicker emergency responses when the local economy is stronger.
If relevant, talk about your machine-learning.¶
We mostly use LLM to cheack our phrasing.We tend to ask a real person for coding problems.
The genre of the story is magazine style, which is the most common genre for static visualizations but has not been as richly utilized with interactive visualizations.
Consistent Visual Platform, Progress Bar / Timebar and Zooming: We used these tools because they helped us maintain a consistent visual platform, ensuring coherence, and reinforcing the narrative's identity. Consistency helps guide the audience's attention and enhances the overall storytelling experience. Within the visual narrative, a progress bar or timebar can be used to indicate the passage of time or the progression of events, which was really helpful for the introduction of our thesis.
Introductory Text, Accompanying Article and Hover Detail
Creating an introductory text for a narrative structure involves setting the stage, introducing key elements, and enticing readers to engage with the story. In a visual narrative, stimulating default views refers to creating visually engaging and compelling scenes or perspectives that captivate the audience's attention from the start.
The "hover detail tool" refers to a feature commonly used in interactive data visualizations. When a user hovers their cursor over a specific data point or element in the visualization, additional details or information about that point are displayed. This tool provides users with instant access to contextual information, enabling them to explore and interpret the data more effectively.
We chose two types of visualization: static and interactive. Static visualization, which is commonly used in magazine-style formats, especially in online journalism, was ideal for introducing the topic of our project.
On the other hand, interactive visualization served to capture the reader's attention and encourage engagement with the data. This flexibility allows for deeper exploration of complex datasets, facilitating the discovery of patterns, trends, and outliers that might otherwise go unnoticed.
Furthermore, interactive visualizations facilitate communication, especially of complex data.In our interactive map, we ranked response times and rent prices, providing hover labels with average values and rankings for clarity. We made this choice because it's challenging to distinguish between similar colors and to quickly compare large sets of numbers. Ranking them makes it easier for both researchers and readers to understand.
Our choice of databases was commendable. The final results provided strong validation for the hypothesis that "economic status is positively correlated with response time."
A fully compatible area division. We couldn't find a polygon dataset that perfectly matched the neighborhood divisions in the fire call and rent data. As a result, we had to manually delineate disputed areas.
Improvement in selecting databases that represent regional economic status could be beneficial. While rent can reflect market demand, it may not accurately depict regional economic conditions and government taxation.
Further refinement in the preliminary screening of fire calls could enhance accuracy. The current exclusion of cases with arrival times less than or equal to 0 might affect the statistical results.
When measuring response time, using the average to represent response speed may not be precise enough. Despite filtering out outright unreasonable data, some extreme values still exist in the records. In such cases, using the median might better represent response speed.
Ying Lu:
- Choose datasets
- Data cleaning and processing
- Data visualization
Building the webpage
Susanna Porcell:
- Choose datasets
- Design of the gener
- Decide the format for visualization
- Data analysis
- Telling the story of the results